1
00:00:00,790 --> 00:00:07,320
[Music]

2
00:00:13,150 --> 00:00:09,270
[Applause]

3
00:00:15,520 --> 00:00:13,160
hi I'm olive I'm undergrad at Carleton

4
00:00:18,609 --> 00:00:15,530
College working with Mika Anderson and

5
00:00:20,019 --> 00:00:18,619
my talk is about the silver pen genome

6
00:00:23,589 --> 00:00:20,029
and its evolution in deep sea

7
00:00:25,690 --> 00:00:23,599
hydrothermal vents so I would just like

8
00:00:27,370 --> 00:00:25,700
to reiterate how important it is to

9
00:00:29,589 --> 00:00:27,380
study hydrothermal vents system

10
00:00:32,019 --> 00:00:29,599
hydrothermal vents are one of the most

11
00:00:35,440 --> 00:00:32,029
ancient continuously inhabited ecosystem

12
00:00:37,150 --> 00:00:35,450
on earth and some research researchers

13
00:00:39,700 --> 00:00:37,160
believe that hydrothermal vent was the

14
00:00:41,770 --> 00:00:39,710
place libraries and if that's true them

15
00:00:45,099 --> 00:00:41,780
live diverse into these two lineages as

16
00:00:49,660 --> 00:00:45,109
we know bacteria and archaea and they

17
00:00:51,700 --> 00:00:49,670
invented new metabolic pathways and also

18
00:00:53,439 --> 00:00:51,710
they out the bacteria also and and

19
00:00:55,630 --> 00:00:53,449
everywhere like they also occupy new

20
00:00:57,430 --> 00:00:55,640
niches so that they spread spread out

21
00:01:05,140 --> 00:00:57,440
throughout the entire earth until now

22
00:01:08,200 --> 00:01:05,150
and so along that idea I would like to

23
00:01:10,240 --> 00:01:08,210
study microbial evolution and I think

24
00:01:12,640 --> 00:01:10,250
there is no more interesting place to

25
00:01:14,470 --> 00:01:12,650
study it and then in hydrothermal vent

26
00:01:17,170 --> 00:01:14,480
because this one of the key habitats in

27
00:01:18,880 --> 00:01:17,180
life's earliest stages and I have a few

28
00:01:21,250 --> 00:01:18,890
questions about microbial evolution and

29
00:01:23,200 --> 00:01:21,260
deep sea hydrothermal vents the first

30
00:01:26,860 --> 00:01:23,210
one what is the genome make variation

31
00:01:29,350 --> 00:01:26,870
that exists in this system and then why

32
00:01:33,010 --> 00:01:29,360
what drives this evolution is it

33
00:01:37,540 --> 00:01:33,020
challenge or or drift or is it necessity

34
00:01:41,430 --> 00:01:37,550
which is selection and lastly how do the

35
00:01:44,380 --> 00:01:41,440
microbes in hydrothermal vents diversify

36
00:01:46,720 --> 00:01:44,390
so first most of this talk is going to

37
00:01:48,580 --> 00:01:46,730
be about population genetics and when

38
00:01:51,280 --> 00:01:48,590
I'm thinking about population genetics I

39
00:01:54,280 --> 00:01:51,290
always think of the four phenomena or

40
00:01:56,560 --> 00:01:54,290
evolutionary forces such as selection

41
00:01:58,600 --> 00:01:56,570
which is the necessity mutation

42
00:02:03,520 --> 00:01:58,610
migration drift and which is the chance

43
00:02:05,050 --> 00:02:03,530
and how it creates an in tracks to give

44
00:02:06,300 --> 00:02:05,060
rise to this variation that we see in a

45
00:02:10,410 --> 00:02:06,310
population

46
00:02:13,690 --> 00:02:10,420
unlike in multicellular organisms in

47
00:02:16,000 --> 00:02:13,700
microbes the variation can take a whole

48
00:02:20,289 --> 00:02:16,010
new meaning so not only that their

49
00:02:22,500 --> 00:02:20,299
sequence can be diverse the genomic

50
00:02:26,610 --> 00:02:22,510
content can also vary between this

51
00:02:29,369 --> 00:02:26,620
and a specific species so some genes are

52
00:02:31,649 --> 00:02:29,379
shared by the whole species which is the

53
00:02:34,080 --> 00:02:31,659
core genes and some genes are not which

54
00:02:36,330 --> 00:02:34,090
is the accessory genes and there are

55
00:02:38,820 --> 00:02:36,340
multiple hypotheses how this accessory

56
00:02:41,100 --> 00:02:38,830
genes are acquired including horse no

57
00:02:44,729 --> 00:02:41,110
gene transfer and also large-scale

58
00:02:47,699 --> 00:02:44,739
deletion but I'm more interested in how

59
00:02:51,030 --> 00:02:47,709
this accessory genes are maintained

60
00:02:53,729 --> 00:02:51,040
throughout his evolution so there has

61
00:02:56,640 --> 00:02:53,739
been some literature debate on this

62
00:02:59,520 --> 00:02:56,650
either it is true selection which is the

63
00:03:03,509 --> 00:02:59,530
necessity or some paper also says that

64
00:03:07,080 --> 00:03:03,519
it is true drift which is chance or a

65
00:03:10,740 --> 00:03:07,090
little bit of both and to tie it back to

66
00:03:12,270 --> 00:03:10,750
our extreme engine environment I would

67
00:03:16,140 --> 00:03:12,280
like to study the span genome evolution

68
00:03:18,509 --> 00:03:16,150
in the deep sea a dream event and these

69
00:03:21,449 --> 00:03:18,519
are my two study sites the first one is

70
00:03:23,849 --> 00:03:21,459
the MIT caiman rice which is in the

71
00:03:27,000 --> 00:03:23,859
Caribbean Sea and these samples were

72
00:03:31,050 --> 00:03:27,010
taken in 2012 2013 by Joey Bruce group

73
00:03:33,390 --> 00:03:31,060
and the second one is the actual which

74
00:03:37,110 --> 00:03:33,400
is close to the hoenn de Fuca plate and

75
00:03:40,620 --> 00:03:37,120
these samples were taken in 2013 2015

76
00:03:43,099 --> 00:03:40,630
but also by Julie Hoover's lab so

77
00:03:46,830 --> 00:03:43,109
because a lot of these microbes were on

78
00:03:49,140 --> 00:03:46,840
Uncle Teva bull's eye we turned to

79
00:03:50,699 --> 00:03:49,150
metagenomic sequencing and I followed

80
00:03:53,490 --> 00:03:50,709
like the basic metagenomics against

81
00:03:57,750 --> 00:03:53,500
workflow so that assembly mapping and

82
00:04:00,020 --> 00:03:57,760
all that stuff and finally I did the

83
00:04:04,770 --> 00:04:00,030
pinning which is the interesting part

84
00:04:08,069 --> 00:04:04,780
and basically I've been my context based

85
00:04:12,720 --> 00:04:08,079
on the GC content and coverage mostly

86
00:04:14,640 --> 00:04:12,730
and then I also found like what taxa

87
00:04:17,940 --> 00:04:14,650
they belong to one talk set of pins

88
00:04:19,949 --> 00:04:17,950
belong to and for this purpose I'm

89
00:04:22,950 --> 00:04:19,959
mostly interested in the most abundant

90
00:04:25,560 --> 00:04:22,960
axon or at the genus level that I found

91
00:04:28,020 --> 00:04:25,570
in my samples which is sulfur ovum so

92
00:04:30,659 --> 00:04:28,030
Bravo miss sulfur oxidizing bacteria in

93
00:04:34,080 --> 00:04:30,669
the theatre among fans and from the bin

94
00:04:36,420 --> 00:04:34,090
recover meta silver atom genomes that I

95
00:04:39,540 --> 00:04:36,430
had I created a pan genome profile

96
00:04:44,700 --> 00:04:39,550
which is what you see here so each of

97
00:04:49,740 --> 00:04:44,710
this layer is a software from genome and

98
00:04:51,480 --> 00:04:49,750
then each of this sorry each of this bar

99
00:04:53,490 --> 00:04:51,490
represents like the gin grip so if the

100
00:05:01,620 --> 00:04:53,500
gin group is there then the bar X is on

101
00:05:03,360 --> 00:05:01,630
that layer and vice versa so now I'm

102
00:05:05,580 --> 00:05:03,370
interested in like what kind of genes

103
00:05:09,540 --> 00:05:05,590
there are in the cell phone pen genome

104
00:05:12,090 --> 00:05:09,550
so and also I'm interested in how those

105
00:05:16,560 --> 00:05:12,100
functions are distributed across the

106
00:05:19,620 --> 00:05:16,570
gene frequency so first here that each

107
00:05:22,230 --> 00:05:19,630
data point is the gin grip and the color

108
00:05:24,540 --> 00:05:22,240
bar basically means the gene function

109
00:05:26,610 --> 00:05:24,550
and on the x axis I have the gene

110
00:05:29,250 --> 00:05:26,620
frequency from gene containing only one

111
00:05:32,460 --> 00:05:29,260
genome to the one in 22 genome which is

112
00:05:34,499 --> 00:05:32,470
the core genome basically and on the XY

113
00:05:36,810 --> 00:05:34,509
axis I have the proportion of that gene

114
00:05:39,800 --> 00:05:36,820
function across a column so the most

115
00:05:42,990 --> 00:05:39,810
important trend here is that R is this R

116
00:05:44,700 --> 00:05:43,000
increase in proportion for translation

117
00:05:47,400 --> 00:05:44,710
coenzyme metabolism and amino acid

118
00:05:50,100 --> 00:05:47,410
metabolism functions across the gene

119
00:05:52,890 --> 00:05:50,110
frequency and the takeaway here is that

120
00:05:54,659 --> 00:05:52,900
the housekeeping functions are basically

121
00:05:56,790 --> 00:05:54,669
more enriched in the core genome versus

122
00:06:00,120 --> 00:05:56,800
the accessory genome which makes sense

123
00:06:02,879 --> 00:06:00,130
but the opposite is also true for the

124
00:06:05,760 --> 00:06:02,889
environment related signaling genes such

125
00:06:08,100 --> 00:06:05,770
as signal transduction and so on so here

126
00:06:10,560 --> 00:06:08,110
I like to point out that the accessory

127
00:06:13,649 --> 00:06:10,570
genome acquire acquisition and

128
00:06:16,110 --> 00:06:13,659
maintenance seems to be not random based

129
00:06:17,790 --> 00:06:16,120
on functions and so that this kind of

130
00:06:23,520 --> 00:06:17,800
like points out toward the selection

131
00:06:25,500 --> 00:06:23,530
rather than the chance case then if been

132
00:06:27,120 --> 00:06:25,510
genome evolution is really driven by

133
00:06:29,189 --> 00:06:27,130
selection I would like to know what kind

134
00:06:31,980 --> 00:06:29,199
of selective pressure exists in this

135
00:06:33,600 --> 00:06:31,990
environment and I would also like to

136
00:06:37,350 --> 00:06:33,610
know like if there is any local

137
00:06:39,420 --> 00:06:37,360
adaptation of this pen genomes so here I

138
00:06:41,730 --> 00:06:39,430
realized that there are two environments

139
00:06:45,529 --> 00:06:41,740
that my samples came from the mid chemin

140
00:06:48,540 --> 00:06:45,539
rise vent and the actual vent on so and

141
00:06:49,800 --> 00:06:48,550
they're really separated by the

142
00:06:53,810 --> 00:06:49,810
continent so

143
00:06:56,610 --> 00:06:53,820
separate and so I calculated the

144
00:06:58,620 --> 00:06:56,620
proportion for each unit calculated the

145
00:07:02,010 --> 00:06:58,630
proportion of that being found in only

146
00:07:05,760 --> 00:07:02,020
actual genome and then I sorted them

147
00:07:09,090 --> 00:07:05,770
from lowest to highest and this are

148
00:07:10,860 --> 00:07:09,100
basically the least represented genes in

149
00:07:12,420 --> 00:07:10,870
the actual genome so they're mostly

150
00:07:16,740 --> 00:07:12,430
represented and only made came in rice

151
00:07:19,260 --> 00:07:16,750
genomes and most of just genes belong to

152
00:07:22,620 --> 00:07:19,270
the blue category which is the ion

153
00:07:24,659 --> 00:07:22,630
transport metabolism categories and most

154
00:07:26,400 --> 00:07:24,669
interestingly they're also mostly

155
00:07:29,070 --> 00:07:26,410
related to phosphate uptake and

156
00:07:31,260 --> 00:07:29,080
regulation which means that phosphate

157
00:07:33,930 --> 00:07:31,270
related genes are more represented than

158
00:07:36,960 --> 00:07:33,940
in the mid cameron rice genomes versus

159
00:07:39,510 --> 00:07:36,970
the actual genomes this is interesting

160
00:07:41,909 --> 00:07:39,520
because in the atlantic ocean where meat

161
00:07:44,700 --> 00:07:41,919
came in rice is the phosphate content is

162
00:07:47,610 --> 00:07:44,710
lower than the passive sorry in divisive

163
00:07:49,800 --> 00:07:47,620
than in the pacific ocean to me this

164
00:07:51,900 --> 00:07:49,810
means that microbes that live in mate

165
00:07:54,570 --> 00:07:51,910
caiman rice could potentially have to

166
00:07:57,600 --> 00:07:54,580
innovate due to this phosphate like an

167
00:07:58,890 --> 00:07:57,610
environment by maintaining the accessory

168
00:08:02,190 --> 00:07:58,900
genes that they got through horizontal

169
00:08:04,170 --> 00:08:02,200
gene transfer and this result is

170
00:08:08,730 --> 00:08:04,180
actually pretty similar to what more in

171
00:08:12,150 --> 00:08:08,740
common found and the prochlorococcus in

172
00:08:15,240 --> 00:08:12,160
the surface ocean and just like i would

173
00:08:18,500 --> 00:08:15,250
just like to try out throw it out there

174
00:08:21,779 --> 00:08:18,510
because i also found a lot of arsenate

175
00:08:25,170 --> 00:08:21,789
related genes in the mid commander eyes

176
00:08:27,810 --> 00:08:25,180
compared compared to actual genomes and

177
00:08:34,490 --> 00:08:27,820
this is also about the prochlorococcus

178
00:08:37,920 --> 00:08:34,500
paper found in the surface ocean and i

179
00:08:40,140 --> 00:08:37,930
we also looked at the PNP s ratio which

180
00:08:43,920 --> 00:08:40,150
kind of like suggests the strength of

181
00:08:46,200 --> 00:08:43,930
evolution on each gene I looked at um so

182
00:08:48,540 --> 00:08:46,210
basically if the P NP r--'s ratio is

183
00:08:51,240 --> 00:08:48,550
higher than 1 then just somewhat

184
00:08:54,990 --> 00:08:51,250
adaptive evolution or positive evolution

185
00:08:57,840 --> 00:08:55,000
and if the Pampas ratio is closer to 0

186
00:09:00,120 --> 00:08:57,850
then negative selection or purifying

187
00:09:03,010 --> 00:09:00,130
selection which is more of conservation

188
00:09:06,400 --> 00:09:03,020
then change happens on that gene

189
00:09:08,980 --> 00:09:06,410
what I found here is that accessory

190
00:09:11,650 --> 00:09:08,990
genes tend to have higher bnps ratios

191
00:09:13,780 --> 00:09:11,660
than the core genes and some of these

192
00:09:17,140 --> 00:09:13,790
genes also have been passed ratio higher

193
00:09:17,860 --> 00:09:17,150
than one but I'm not really entirely

194
00:09:21,130 --> 00:09:17,870
sure

195
00:09:23,140 --> 00:09:21,140
statistically here because this were not

196
00:09:25,870 --> 00:09:23,150
you I didn't really calculate defense

197
00:09:28,870 --> 00:09:25,880
base ratios using a maximum likelihood

198
00:09:32,440 --> 00:09:28,880
modal but just like kind of like point

199
00:09:35,350 --> 00:09:32,450
estimate for each gym so like this

200
00:09:38,020 --> 00:09:35,360
higher bnps ratio for accessory gene

201
00:09:40,420 --> 00:09:38,030
might be due to either adaptive

202
00:09:44,100 --> 00:09:40,430
evolution on these genes or that that

203
00:09:46,000 --> 00:09:44,110
this genes are less likely to undergo

204
00:09:53,410 --> 00:09:46,010
negative selection or purifying

205
00:09:55,030 --> 00:09:53,420
selection so in conclusion we could have

206
00:09:58,630 --> 00:09:55,040
saw that man genome evolution is

207
00:10:00,520 --> 00:09:58,640
selective and we had several evidence of

208
00:10:02,830 --> 00:10:00,530
this the first one is that different

209
00:10:05,230 --> 00:10:02,840
gene categories were enriched in core

210
00:10:08,050 --> 00:10:05,240
versus accessory genomes which was the

211
00:10:10,300 --> 00:10:08,060
first one and we also saw some local

212
00:10:12,610 --> 00:10:10,310
adaptation potentially due to phosphate

213
00:10:15,870 --> 00:10:12,620
on content difference between the mid

214
00:10:18,970 --> 00:10:15,880
came in rice in the actual environment

215
00:10:21,070 --> 00:10:18,980
and then we also saw that there was

216
00:10:23,530 --> 00:10:21,080
higher probability of adaptive evolution

217
00:10:26,290 --> 00:10:23,540
in the accessory genome compared to the

218
00:10:28,590 --> 00:10:26,300
core genome or that there was some

219
00:10:31,300 --> 00:10:28,600
different evolutionary scheme of

220
00:10:34,300 --> 00:10:31,310
accessory genome compared to the court

221
00:10:36,580 --> 00:10:34,310
genome and finally if though I didn't

222
00:10:38,950 --> 00:10:36,590
really mention it here we saw some

223
00:10:42,130 --> 00:10:38,960
evidence of gene specific sweeps which

224
00:10:46,480 --> 00:10:42,140
kind of points a selection that happens

225
00:10:49,990 --> 00:10:46,490
in these microbial populations and some

226
00:10:53,080 --> 00:10:50,000
more of bigger-picture conclusions we

227
00:10:56,920 --> 00:10:53,090
saw that necessity was the key factor

228
00:10:59,980 --> 00:10:56,930
and pan genome evolution in hydrothermal

229
00:11:01,750 --> 00:10:59,990
vent compactive chance and third is

230
00:11:04,960 --> 00:11:01,760
still an open question how important

231
00:11:06,580 --> 00:11:04,970
necessary was in early life the genomes

232
00:11:09,160 --> 00:11:06,590
of easy today have been molded by

233
00:11:12,790 --> 00:11:09,170
evolution from geum's back then so we

234
00:11:15,220 --> 00:11:12,800
would infer some connection through that

235
00:11:16,870 --> 00:11:15,230
and finally I would also point out that

236
00:11:18,370 --> 00:11:16,880
when genome variation

237
00:11:22,840 --> 00:11:18,380
the importance of study pen genomic

238
00:11:25,329 --> 00:11:22,850
variation in addition to just single

239
00:11:27,340 --> 00:11:25,339
point polymorphisms because pen genomic

240
00:11:30,249 --> 00:11:27,350
variation is really widespread and it

241
00:11:31,780 --> 00:11:30,259
also takes into account the one

242
00:11:34,389 --> 00:11:31,790
evolutionary force which this original

243
00:11:36,280 --> 00:11:34,399
gene transfer that is not really taken

244
00:11:40,030 --> 00:11:36,290
into account by just single point

245
00:11:42,550 --> 00:11:40,040
polymorphism and by that I would like to

246
00:11:45,610 --> 00:11:42,560
thank the Andersen lab at Carleton and

247
00:11:53,829 --> 00:11:45,620
all the crews that did the same thing

248
00:11:56,740 --> 00:11:53,839
and the funding thank you okay do we

249
00:12:43,980 --> 00:11:56,750
have any questions for all of you can

250
00:12:52,329 --> 00:12:50,139
yeah so yes so the question was the

251
00:12:55,720 --> 00:12:52,339
connection livened extremo file and

252
00:12:58,210 --> 00:12:55,730
genome evolution part yeah like when I

253
00:13:00,610 --> 00:12:58,220
was getting to the project I didn't

254
00:13:02,740 --> 00:13:00,620
really care about extra part because I

255
00:13:05,319 --> 00:13:02,750
was really looking into the evolution of

256
00:13:08,379 --> 00:13:05,329
pan genomes and so it wasn't necessarily

257
00:13:11,170 --> 00:13:08,389
like into just like specific to a pen

258
00:13:13,650 --> 00:13:11,180
genome and I I realized that like

259
00:13:16,749 --> 00:13:13,660
studying XML files not probably the best

260
00:13:19,059 --> 00:13:16,759
like place to do like you know fusion

261
00:13:27,350 --> 00:13:19,069
study but yeah that was the data that I

262
00:13:34,050 --> 00:13:30,420
great talk I'm I'm all I'm a postdoc at

263
00:13:36,390 --> 00:13:34,060
Ames drift the chance drifts is usually

264
00:13:39,150 --> 00:13:36,400
a pretty strong function of population

265
00:13:41,550 --> 00:13:39,160
sides yeah I wonder if there's a way for

266
00:13:47,490 --> 00:13:41,560
in your data set to estimate population

267
00:13:55,380 --> 00:13:47,500
size yeah so I so first we kind of like

268
00:13:58,110 --> 00:13:55,390
had some like the time series data set

269
00:14:00,800 --> 00:13:58,120
as well and we kind of like I guess like

270
00:14:05,010 --> 00:14:00,810
I saw that there was like a decrease of

271
00:14:09,480 --> 00:14:05,020
like the coverage from year to year

272
00:14:12,390 --> 00:14:09,490
but I don't really know like the the was

273
00:14:14,220 --> 00:14:12,400
it called the absolute population size

274
00:14:16,590 --> 00:14:14,230
or like the aphid population side for

275
00:14:18,290 --> 00:14:16,600
the populations that I have so yeah

276
00:14:21,450 --> 00:14:18,300
probably

277
00:14:26,520 --> 00:14:21,460
Dhar like some ways to estimate the

278
00:14:30,000 --> 00:14:26,530
using just coverage but yeah sure thank

279
00:14:30,620 --> 00:14:30,010
you very much thank you everyone for